Shimkin 4 Reinforcement Learning – Basic Algorithms

نویسنده

Nahum Shimkin

چکیده

Our agent usually has only partial knowledge of its environment, and therefore will use some form of learning scheme, based on the observed signals. To start with, the agent needs to use some parametric model of the environment. We shall use the model of a stationary MDP, with given state space and actions space. However, the state transition matrix P = (p(s′|s, a)) and the immediate reward function r = (r(s, a, s′)) may not be given. We shall further assume the the observed signal is indeed the state of the dynamic proceed (fully observed MDP), and that the reward signal is the immediate reward rt, with mean r(st, at).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multigrid Algorithms for Temporal Difference Reinforcement Learning

We introduce a class of Multigrid based temporal difference algorithms for reinforcement learning with linear function approximation. Multigrid methods are commonly used to accelerate convergence of iterative numerical computation algorithms. The proposed Multigrid-enhanced TD(λ) algorithms allows to accelerate the convergence of the basic TD(λ) algorithm while keeping essentially the same per-...

متن کامل

A Geometric Approach to Multi-Criterion Reinforcement Learning

We consider the problem of reinforcement learning in a controlled Markov environment with multiple objective functions of the long-term average reward type. The environment is initially unknown, and furthermore may be affected by the actions of other agents, actions that are observed but cannot be predicted beforehand. We capture this situation using a stochastic game model, where the learning ...

متن کامل

Basis Function Adaptation in Temporal Difference Reinforcement Learning

We examine methods for on-line optimization of the basis function for temporal difference Reinforcement Learning algorithms. We concentrate on architectures with a linear parameterization of the value function. Our methods optimize the weights of the network while simultaneously adapting the parameters of the basis functions in order to decrease the Bellman approximation error. A gradient-based...

متن کامل

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

The commonly used Q-learning algorithm combined with function approximation induces systematic overestimations of state-action values. These systematic errors might cause instability, poor performance and sometimes divergence of learning. In this work, we present the AVERAGED TARGET DQN (ADQN) algorithm, an adaptation to the DQN class of algorithms which uses a weighted average over past learne...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Shimkin 4 Reinforcement Learning – Basic Algorithms

نویسنده

چکیده

منابع مشابه

Multigrid Algorithms for Temporal Difference Reinforcement Learning

A Geometric Approach to Multi-Criterion Reinforcement Learning

Basis Function Adaptation in Temporal Difference Reinforcement Learning

Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning

Reinforcement Learning in Neural Networks: A Survey

عنوان ژورنال:

اشتراک گذاری